Real-Time Collection of Reliable and Representative Tweets Datasets Related to News Events

نویسندگان

  • Béatrice Mazoyer
  • Julia Cagé
  • Céline Hudelot
  • Marie-Luce Viaud
چکیده

This paper is part of a wider work studying the co-influences of Twitter and the production of information by traditional media. A strong prerequisite of this study is to collect, with the limitations of the Twitter API, tweets linked to media events that are representative of the real Twitter activity. This paper describes two proposed approaches to handle this important task. The first one, inspired by information retrieval, puts the focus on query formulation. It consists on bridging the vocabulary gap between traditional news articles and tweets, by iteratively modifying the queries sent to the Twitter API depending on the tweets retrieved by previous queries. The second approach consists in streaming a representative sample of all emitted tweets and dynamically clustering them in events. We also discuss approaches to evaluate the collected datasets under the point of view of their representativity of the real activity on Twitter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TEA: Episode Analytics on Short Messages

Twitter is a widely used micro-blogging service, which in recent times, has become a reliable source of happening news around the world [11]. Breaking news are covered in twitter; the magnitude and volumes of tweets reflecting on the nature and intensity of the news. During events, many tweets are posted either expressing sentiments about the event or just about the occurrence of the event. Eve...

متن کامل

Time-based Microblog Distillation

This paper presents a simple approach for identifying relevant and reliable news from the Twitter stream, as soon as they emerge. The approach is based on a near-real time systems for sentiment analysis on Twitter, implemented by Fondazione Ugo Bordoni, and properly modified in order to detect the most representative tweets in a specified time slot. This work represents a first step towards the...

متن کامل

Real-time Topic Detection with Bursty N-grams

Twitter is becoming an ever more popular platform for discovering and sharing information about current events, both personal and global. The scale and diversity of messages makes the discovery and analysis of breaking news very challenging. Nonetheless, journalists and other news consumers are increasingly relying on tools to help them make sense of Twitter. Here, we describe a fully-automated...

متن کامل

From Tweets to Events: Exploring a Scalable Solution for Twitter Streams

The unprecedented use of social media through smartphones and other web-enabled mobile devices has enabled the rapid adoption of platforms like Twitter. Event detection has found many applications on the web, including breaking news identification and summarization. The recent increase in the usage of Twitter during crises has attracted researchers to focus on detecting events in tweets. Howeve...

متن کامل

Graph-based Method for Summarized Storyline Generation in Twitter

Twitter has become a leading source of real-time world-wide information and a great medium for exploring emerging events, breaking news and general topics which most matter to a broad audience. On the other hand, the explosive rate of incoming information in Twitter leads users to experience information overload. Whereas, a significant fraction of tweets are about news events, summarizing the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018